36 research outputs found

    Location-Based Plant Species Prediction Using A CNN Model Trained On Several Kingdoms - Best Method Of GeoLifeCLEF 2019 Challenge

    Get PDF
    International audienceThis technical report describes the model that achieved the best performance of the GeoLifeCLEF challenge, the objective of which was to evaluate methods for plant species prediction based on their geographical location. Our method is based on an adaptation of the Inception v3 architecture initially dedicated to the classification of RGB images. We modified the input layer of this architecture so as to process the spatialized environmental tensors as images with 77 distinct channels. Using this architecture, we did train several models that mainly differed in the used training data and in the predicted output classes. One of the main objective, in particular, was to compare the performance of a model trained with plant occurrences only to that obtained with a model trained on all available occurrences, including the species of other kingdoms. Our results show that the global model performs consistently better than the plant-specific model. This suggests that the convolutional neural network is able to capture some inter-dependencies among all species and that this information significantly improves the generalisation capacity of the model for any species

    Location-Based Plant Species Prediction Using A CNN Model Trained On Several Kingdoms - Best Method Of GeoLifeCLEF 2019 Challenge

    Get PDF
    International audienceThis technical report describes the model that achieved the best performance of the GeoLifeCLEF challenge, the objective of which was to evaluate methods for plant species prediction based on their geographical location. Our method is based on an adaptation of the Inception v3 architecture initially dedicated to the classification of RGB images. We modified the input layer of this architecture so as to process the spatialized environmental tensors as images with 77 distinct channels. Using this architecture, we did train several models that mainly differed in the used training data and in the predicted output classes. One of the main objective, in particular, was to compare the performance of a model trained with plant occurrences only to that obtained with a model trained on all available occurrences, including the species of other kingdoms. Our results show that the global model performs consistently better than the plant-specific model. This suggests that the convolutional neural network is able to capture some inter-dependencies among all species and that this information significantly improves the generalisation capacity of the model for any species

    Overview of LifeCLEF location-based species prediction task 2020 (GeoLifeCLEF)

    Get PDF
    International audienceUnderstanding the geographic distribution of species is a key concern in conservation. By pairing species occurrences with environmental features, researchers can model the relationship between an environment and the species which may be found there. To advance the state-of-the-art in this area, a large-scale machine learning competition called GeoLifeCLEF 2020 was organized. It relied on a dataset of 1.9 million species observations paired with high-resolution remote sensing imagery, land cover data, and altitude, in addition to traditional low-resolution climate and soil variables. This paper presents an overview of the competition , synthesizes the approaches used by the participating groups, and analyzes the main results. In particular, we highlight the ability of remote sensing imagery and convolutional neural networks to improve predictive performance, complementary to traditional approaches

    The James Webb Space Telescope Mission

    Full text link
    Twenty-six years ago a small committee report, building on earlier studies, expounded a compelling and poetic vision for the future of astronomy, calling for an infrared-optimized space telescope with an aperture of at least 4m4m. With the support of their governments in the US, Europe, and Canada, 20,000 people realized that vision as the 6.5m6.5m James Webb Space Telescope. A generation of astronomers will celebrate their accomplishments for the life of the mission, potentially as long as 20 years, and beyond. This report and the scientific discoveries that follow are extended thank-you notes to the 20,000 team members. The telescope is working perfectly, with much better image quality than expected. In this and accompanying papers, we give a brief history, describe the observatory, outline its objectives and current observing program, and discuss the inventions and people who made it possible. We cite detailed reports on the design and the measured performance on orbit.Comment: Accepted by PASP for the special issue on The James Webb Space Telescope Overview, 29 pages, 4 figure

    InterprĂ©tabilitĂ© des modĂšles de distribution d’espĂšces basĂ©s sur des rĂ©seaux de neurones convolutifs

    No full text
    Species distribution models link the geographic distribution of a species to its environment. The objectives of using these models are multiple. They can be used to extract knowledge on species and their environmental preferences, to help with conservation plans and policies, to monitor and anticipate the spread of invasive species, or to simulate environmental changes and their impacts on species. To best meet these objectives, it is necessary to design efficient, accurate and interpretable models. Most of the models used today are relatively simple models. These models have the advantage of being easy to interpret by producing simple relationships between a species and its environment. However, they often share some shortcomings such as sensitivity to overfitting, which requires a careful choice of descriptive data of the environment to avoid interpretation errors. Models based on machine learning approaches have shown performances that are often as good or even better, with a stronger robustness against overfitting. However, these methods are more often criticized for their lack of interpretability. This is the case with convolutional neural networks whose first experiments have shown promising results for their use in species distribution modeling. Convolutional neural networks are known for their particularly high performance in all image processing tasks (classification, object detection, counting, etc.). They have the particularity of being able to use very large data with little risk of overfitting. Even more than other machine learning approaches, these models are often described as black boxes that are difficult to interpret. We propose to study the use of these models, called Deep-SDMs, in the context of species distribution prediction with a particular attention to interpretation in order to highlight the potential interests of this new approach while trying to clarify the mechanisms involved.We present the use and analysis of Deep-SDMs with several interpretability experiments in different contexts. We conduct comparisons on some aspects with more state-of-the-art models. We propose qualitative and quantitative analyses on the interpretation of Deep-SDMs learning. In particular, we propose to study what the model captures, either by analyzing the differences in performance according to the data used and the information they contain, or directly by studying the learned representation space of the model (the last layer of the model).Overall we show that it is possible to analyze and interpret model learning in several ways, leading to interesting ecological conclusions. We show an interesting potential of Deep-SDMs that allow: (1) to learn a single model for many species simultaneously and using observation data without absence data, (2) to use more complex and richer representations of the environment thanks to their ability to use very high dimensional data, (3) often better performances than other models, especially on rare species, (4) learning on a very large scale (thousands of species and regions the size of countries) and at a very fine resolution (around ten meters) thanks to remote sensing data, and (5) possible reuse of models in similar contexts, taking advantage of the learning already done.Les modĂšles de distributions d'espĂšces font le lien entre la distribution gĂ©ographique d’une espĂšce et son environnement. Les objectifs de l’utilisation de ces modĂšles sont multiples. On peut citer entre autres l’extraction de connaissance sur les espĂšces et leur prĂ©fĂ©rences environnementales, l’aide aux plans de conservations et politiques de protections des espĂšces, la surveillance et l’anticipation de la propagation d’espĂšces envahissantes ou encore les simulations d’évolution de l’environnement et leurs impacts sur les espĂšces. Pour rĂ©pondre au mieux Ă  ces objectifs il est nĂ©cessaire de concevoir des modĂšles performants, prĂ©cis et interprĂ©tables. La plupart des modĂšles utilisĂ©s aujourd’hui sont des modĂšles relativement peu complexes. Ces modĂšles ont l’avantage d’ĂȘtre faciles Ă  interprĂ©ter en produisant des relations simples entre une espĂšce et son environnement. Cependant, ils partagent souvent certains dĂ©fauts comme la sensibilitĂ© au sur-apprentissage nĂ©cessitant ainsi de bien choisir les donnĂ©es descriptives de l’environnement pour Ă©viter les erreurs d’interprĂ©tation. Des modĂšles basĂ©s sur des approches d’apprentissage artificiel ont montrĂ© des performances souvent aussi bonnes voire meilleures avec une plus forte robustesse contre le sur-apprentissage. Ces mĂ©thodes sont en revanche plus souvent critiquĂ©es pour leur manque d’interprĂ©tabilitĂ©. C'est le cas avec les rĂ©seaux de neurones convolutifs dont les premiĂšres expĂ©riences ont montrĂ© des rĂ©sultats prometteurs pour leur utilisation en modĂ©lisation de la distribution d’espĂšces. Les rĂ©seaux de neurones convolutifs sont connus pour leurs performances particuliĂšrement Ă©levĂ©es dans toutes les tĂąches de traitement d’image (classification, dĂ©tection d’objets, comptage, etc.). Ils ont la particularitĂ© de pouvoir utiliser des donnĂ©es de trĂšs grandes dimensions avec peu de risques de sur-apprentissage. Encore plus que les autres modĂšles d’apprentissage artificiel, ces modĂšles sont trĂšs souvent dĂ©crits comme des boĂźtes noires difficiles Ă  interprĂ©ter. Nous proposons d’étudier l’utilisation de ces modĂšles, appelĂ©s Deep-SDMs, dans le contexte de la prĂ©diction de distribution d’espĂšces en portant une attention particuliĂšre Ă  l'interprĂ©tation afin de mettre en avant les intĂ©rĂȘts potentiels de cette nouvelle approche tout en essayant d’éclaircir au mieux les mĂ©canismes en jeu.Nous prĂ©sentons l’utilisation et l’analyse des modĂšles Deep-SDMs avec plusieurs expĂ©riences d’interprĂ©tabilitĂ© dans diffĂ©rents contextes. Nous menons des comparaisons sur certains aspects avec des modĂšles plus Ă©tat-de-l’art. Nous proposons des analyses qualitatives et quantitatives sur l’interprĂ©tation des apprentissages de Deep-SDMs. Nous proposons notamment d’étudier ce que le modĂšle capture, soit en analysant les diffĂ©rences de performances selon les donnĂ©es utilisĂ©es et les informations qu’elles contiennent, soit directement en Ă©tudiant l’espace de reprĂ©sentation appris du modĂšle (la derniĂšre couche du modĂšle).Dans l’ensemble nous montrons qu’il est possible d'analyser et d'interprĂ©ter l’apprentissage des modĂšles de plusieurs maniĂšres, permettant d’aboutir Ă  des conclusions Ă©cologiques intĂ©ressantes. Nous montrons un potentiel intĂ©ressant des Deep-SDMs qui permettent: (1) d’apprendre un unique modĂšles pour de nombreuses espĂšces simultanĂ©ment et en utilisant des donnĂ©es d’observations sans donnĂ©es d’absences, (2) d’utiliser des reprĂ©sentations plus complexes et plus riches de l’environnement grĂące Ă  leur capacitĂ© Ă  utiliser des donnĂ©es de trĂšs grande dimension, (3) des performances souvent meilleures que les autres modĂšles, en particulier sur les espĂšces rares, (4) un apprentissage Ă  la fois Ă  trĂšs grande Ă©chelle (sur des milliers d’espĂšces et des rĂ©gions de la taille de pays) et Ă  trĂšs fine rĂ©solution (de l’ordre de la dizaine de mĂštres) grĂące aux donnĂ©es de remote sensing, et (5) une rĂ©utilisation possible des modĂšles dans des contextes proches profitant en partie des apprentissage dĂ©jĂ  effectuĂ©s

    InterprĂ©tabilitĂ© des modĂšles de distribution d’espĂšces basĂ©s sur des rĂ©seaux de neurones convolutifs

    No full text
    Species distribution models link the geographic distribution of a species to its environment. The objectives of using these models are multiple. They can be used to extract knowledge on species and their environmental preferences, to help with conservation plans and policies, to monitor and anticipate the spread of invasive species, or to simulate environmental changes and their impacts on species. To best meet these objectives, it is necessary to design efficient, accurate and interpretable models. Most of the models used today are relatively simple models. These models have the advantage of being easy to interpret by producing simple relationships between a species and its environment. However, they often share some shortcomings such as sensitivity to overfitting, which requires a careful choice of descriptive data of the environment to avoid interpretation errors. Models based on machine learning approaches have shown performances that are often as good or even better, with a stronger robustness against overfitting. However, these methods are more often criticized for their lack of interpretability. This is the case with convolutional neural networks whose first experiments have shown promising results for their use in species distribution modeling. Convolutional neural networks are known for their particularly high performance in all image processing tasks (classification, object detection, counting, etc.). They have the particularity of being able to use very large data with little risk of overfitting. Even more than other machine learning approaches, these models are often described as black boxes that are difficult to interpret. We propose to study the use of these models, called Deep-SDMs, in the context of species distribution prediction with a particular attention to interpretation in order to highlight the potential interests of this new approach while trying to clarify the mechanisms involved.We present the use and analysis of Deep-SDMs with several interpretability experiments in different contexts. We conduct comparisons on some aspects with more state-of-the-art models. We propose qualitative and quantitative analyses on the interpretation of Deep-SDMs learning. In particular, we propose to study what the model captures, either by analyzing the differences in performance according to the data used and the information they contain, or directly by studying the learned representation space of the model (the last layer of the model).Overall we show that it is possible to analyze and interpret model learning in several ways, leading to interesting ecological conclusions. We show an interesting potential of Deep-SDMs that allow: (1) to learn a single model for many species simultaneously and using observation data without absence data, (2) to use more complex and richer representations of the environment thanks to their ability to use very high dimensional data, (3) often better performances than other models, especially on rare species, (4) learning on a very large scale (thousands of species and regions the size of countries) and at a very fine resolution (around ten meters) thanks to remote sensing data, and (5) possible reuse of models in similar contexts, taking advantage of the learning already done.Les modĂšles de distributions d'espĂšces font le lien entre la distribution gĂ©ographique d’une espĂšce et son environnement. Les objectifs de l’utilisation de ces modĂšles sont multiples. On peut citer entre autres l’extraction de connaissance sur les espĂšces et leur prĂ©fĂ©rences environnementales, l’aide aux plans de conservations et politiques de protections des espĂšces, la surveillance et l’anticipation de la propagation d’espĂšces envahissantes ou encore les simulations d’évolution de l’environnement et leurs impacts sur les espĂšces. Pour rĂ©pondre au mieux Ă  ces objectifs il est nĂ©cessaire de concevoir des modĂšles performants, prĂ©cis et interprĂ©tables. La plupart des modĂšles utilisĂ©s aujourd’hui sont des modĂšles relativement peu complexes. Ces modĂšles ont l’avantage d’ĂȘtre faciles Ă  interprĂ©ter en produisant des relations simples entre une espĂšce et son environnement. Cependant, ils partagent souvent certains dĂ©fauts comme la sensibilitĂ© au sur-apprentissage nĂ©cessitant ainsi de bien choisir les donnĂ©es descriptives de l’environnement pour Ă©viter les erreurs d’interprĂ©tation. Des modĂšles basĂ©s sur des approches d’apprentissage artificiel ont montrĂ© des performances souvent aussi bonnes voire meilleures avec une plus forte robustesse contre le sur-apprentissage. Ces mĂ©thodes sont en revanche plus souvent critiquĂ©es pour leur manque d’interprĂ©tabilitĂ©. C'est le cas avec les rĂ©seaux de neurones convolutifs dont les premiĂšres expĂ©riences ont montrĂ© des rĂ©sultats prometteurs pour leur utilisation en modĂ©lisation de la distribution d’espĂšces. Les rĂ©seaux de neurones convolutifs sont connus pour leurs performances particuliĂšrement Ă©levĂ©es dans toutes les tĂąches de traitement d’image (classification, dĂ©tection d’objets, comptage, etc.). Ils ont la particularitĂ© de pouvoir utiliser des donnĂ©es de trĂšs grandes dimensions avec peu de risques de sur-apprentissage. Encore plus que les autres modĂšles d’apprentissage artificiel, ces modĂšles sont trĂšs souvent dĂ©crits comme des boĂźtes noires difficiles Ă  interprĂ©ter. Nous proposons d’étudier l’utilisation de ces modĂšles, appelĂ©s Deep-SDMs, dans le contexte de la prĂ©diction de distribution d’espĂšces en portant une attention particuliĂšre Ă  l'interprĂ©tation afin de mettre en avant les intĂ©rĂȘts potentiels de cette nouvelle approche tout en essayant d’éclaircir au mieux les mĂ©canismes en jeu.Nous prĂ©sentons l’utilisation et l’analyse des modĂšles Deep-SDMs avec plusieurs expĂ©riences d’interprĂ©tabilitĂ© dans diffĂ©rents contextes. Nous menons des comparaisons sur certains aspects avec des modĂšles plus Ă©tat-de-l’art. Nous proposons des analyses qualitatives et quantitatives sur l’interprĂ©tation des apprentissages de Deep-SDMs. Nous proposons notamment d’étudier ce que le modĂšle capture, soit en analysant les diffĂ©rences de performances selon les donnĂ©es utilisĂ©es et les informations qu’elles contiennent, soit directement en Ă©tudiant l’espace de reprĂ©sentation appris du modĂšle (la derniĂšre couche du modĂšle).Dans l’ensemble nous montrons qu’il est possible d'analyser et d'interprĂ©ter l’apprentissage des modĂšles de plusieurs maniĂšres, permettant d’aboutir Ă  des conclusions Ă©cologiques intĂ©ressantes. Nous montrons un potentiel intĂ©ressant des Deep-SDMs qui permettent: (1) d’apprendre un unique modĂšles pour de nombreuses espĂšces simultanĂ©ment et en utilisant des donnĂ©es d’observations sans donnĂ©es d’absences, (2) d’utiliser des reprĂ©sentations plus complexes et plus riches de l’environnement grĂące Ă  leur capacitĂ© Ă  utiliser des donnĂ©es de trĂšs grande dimension, (3) des performances souvent meilleures que les autres modĂšles, en particulier sur les espĂšces rares, (4) un apprentissage Ă  la fois Ă  trĂšs grande Ă©chelle (sur des milliers d’espĂšces et des rĂ©gions de la taille de pays) et Ă  trĂšs fine rĂ©solution (de l’ordre de la dizaine de mĂštres) grĂące aux donnĂ©es de remote sensing, et (5) une rĂ©utilisation possible des modĂšles dans des contextes proches profitant en partie des apprentissage dĂ©jĂ  effectuĂ©s

    Location-based species recommendation using co-occurrences and environment- GeoLifeCLEF 2018 challenge

    No full text
    International audienceThis paper presents several approaches for plant predictions given their location in the context of the GeoLifeCLEF 2018 challenge. We have developed three kinds of prediction models, one convolutional neural network on environmental data (CNN), one neural network on co-occurrences data and two other models only based on the spatial occurrences of species (a closest-location classifier and a random forest fitted on the spatial coordinates). We also evaluated the combination of these models through two different late fusion methods (one based on predictive probabilities and the other one based on predictive ranks). Results show the effectiveness of the CNN which obtained the best prediction score of the whole GeoLifeCLEF challenge. The fusion of this model with the spatial ones only provides slight improvements suggesting that the CNN already captured most of the spatial information in addition to the environmental preferences of the plants

    Participation of LIRMM / Inria to the GeoLifeCLEF 2020 challenge

    Get PDF
    This paper describes the methods that we have implemented in the context of the GeoLifeCLEF 2020 machine learning challenge. The goal of this challenge is to advance the state-of-the-art in location-based species recommendation on a very large dataset of 1.9 million species observations, paired with high-resolution remote sensing imagery, land cover data, and altitude. We provide a detailed description of the algorithms and methodology, developed by the LIRMM / Inria team, in order to facilitate the understanding and reproducibility of the obtained results

    How do deep convolutional SDM trained on satellite images unravel vegetation ecology?

    Full text link
    Species distribution models (SDM) assess and predict how species spatial distributions depend on the environment, due to species ecological preferences. These models are used in many different scenarios such as conservation plans or monitoring of invasive species. The choice of a model and of environmental data have strong impact on the model's ability to capture important ecological information. Specifically, state-of-the-art models generally rely on local, punctual environmental information, and do not take into account environmental variation in surrounding landscape. Here we use a convolutional neural network model to analyze and predict species distributions depending on high resolution data including remote sensing images, land cover and altitude. We show that the model unravel the functional response of vegetation to both local and large-scale environmental variation. To demonstrate the ecological significance of the results, we propose an original statistical analysis of t-SNE nonlinear dimension reduction. We illustrate and test the traits-species- environment relationships learned by the model and expressed in t-SNE dimensions

    Very High Resolution Species Distribution Modeling Based on Remote Sensing Imagery: How to Capture Fine-Grained and Large-Scale Vegetation Ecology With Convolutional Neural Networks?

    No full text
    International audienceSpecies Distribution Models (SDMs) are fundamental tools in ecology for predicting the geographic distribution of species based on environmental data. They are also very useful from an application point of view, whether for the implementation of conservation plans for threatened species or for monitoring invasive species. The generalizability and spatial accuracy of a SDM depend very strongly on the type of model used and the environmental data used as explanatory variables. In this paper, we study a country-wide species distribution model based on very high resolution (1m) remote sensing images processed by a convolutional neural network. We demonstrate that this model can capture landscape and habitat information at very fine spatial scales, while providing overall better predictive performance than conventional models. Moreover, to demonstrate the ecological significance of the model, we propose an original analysis based on the t-SNE dimension reduction technique. It allows visualising the relation between input data and species traits or environment learned by the model as well as conducting some statistical tests verifying them. We also analyse the spatial mapping of the t-SNE dimensions at both national and local levels, showing the model benefit to automatically learn environmental variation at multiple scale
    corecore